NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Temporally Streaming Audio-Visual Synchronization for Real-World Videos

https://doi.org/10.1109/WACV61041.2025.00490

Voas, Jordan; Tseng, Wei-Cheng; Berry, Layne; Hu, Xixi; Peng, Puyuan; Stuedemann, James; Harwath, David (February 2025, IEEE)

Free, publicly-accessible full text available February 26, 2026
Multimodal Contextualized Semantic Parsing from Speech

https://doi.org/10.18653/v1/2024.acl-long.398

Voas, Jordan; Harwath, David; Mooney, Ray (August 2024, Association for Computational Linguistics)

Full Text Available
Multimodal Contextualized Semantic Parsing from Speech

Voas, Jordan; Mooney, Raymond; Harwath, David (June 2024, https://doi.org/10.48550/arXiv.2406.06438)

his paper introduces Semantic Parsing in Contextual Environments (SPICE), a task aimed at improving artificial agents’ contextual awareness by integrating multimodal inputs with prior contexts. Unlike traditional semantic parsing, SPICE provides a structured and interpretable framework for dynamically updating an agent’s knowledge with new information, reflecting the complexity of human communication. To support this task, the authors develop the VG-SPICE dataset, which challenges models to construct visual scene graphs from spoken conversational exchanges, emphasizing the integration of speech and visual data. They also present the Audio-Vision Dialogue Scene Parser (AViD-SP), a model specifically designed for VG-SPICE. Both the dataset and model are released publicly, with the goal of advancing multimodal information processing and integration.
more » « less
Full Text Available
What is the Best Automated Metric for Text to Motion Generation?

https://doi.org/10.1145/3610548.3618185

Voas, Jordan; Wang, Yili; Huang, Qixing; Mooney, Raymond (December 2023, ACM SIGGRAPH Asia)

Full Text Available

Search for: All records